Data Preprocessing

library(dplyr)
library(naniar)
library(pcalg)
library(ggplot2)
library(dagitty)
library("splines")

First, let’s have a look at the data set.

## Warning: NAs introduced by coercion
##   X     country year      US_ODA     US_OOF US_aid_total Rest_of_world_ODA
## 1 0 Afghanistan 2000    3.211893  0.0000000     3.211893          113.9824
## 2 1 Afghanistan 2001    9.987947  0.0000000     9.987947          431.5182
## 3 2 Afghanistan 2002  470.108307  0.0000000   470.108307          913.8090
## 4 3 Afghanistan 2003  609.058533  0.8149289   609.873474          938.7944
## 5 4 Afghanistan 2004  949.496277  1.8299663   951.326233         1235.8495
## 6 5 Afghanistan 2005 1557.600098 24.0675926  1581.667725         1139.0460
##   Rest_of_world_OOF Rest_of_world_aid_total polity2  oda_gni population
## 1        -0.1459959                113.8364      -7       NA   20.09376
## 2        -8.1849251                423.3333      -7 16.67001   20.96646
## 3       -35.2187729                878.5902      -7 31.79141   21.97992
## 4        -4.6263351                934.1680      -7 34.82247   23.06485
## 5       -36.8433380               1199.0061      -7 43.65011   24.11898
## 6       -57.4928169               1081.5532      -7 45.14558   25.07080
##   recipient_trade_world GDP_per_growth_rate population_growth_rate
## 1              1743.062                  NA                     NA
## 2              2288.148                  NA             0.04162397
## 3              3263.558          0.36601500             0.04610845
## 4              2814.666          0.03586068             0.04703817
## 5              3027.984          0.06804592             0.04370528
## 6              3372.937          0.09596804             0.03796530
##   GDP_total_growth_rate
## 1                    NA
## 2                    NA
## 3                    NA
## 4              8.832278
## 5              1.414118
## 6             11.229715

First, let’s visualize the distribution of the missing data.

we decide to drop the observations that have missing values for the following reason: 1. “MNAR: Missing Not at Random - the missing is not random, it correlates with unobservable characteristics unknown to a researcher.” The missing rows gather around the last 400 rows, so we cannot assume the missing is at random. 2. we can afford to drop 7.7% of the original data as we still have 92.3% data left.

To get away with the time correlated concern, we decide to base our analysis on 2007 and 2012 because they have the least missing data. We conduct analysis on one of them, and use the result of the other one to compare, which serves as a part of sensitive analysis to test the robustness of the former.

aid = aid %>%
  na.omit(aid)

aid2007 = aid %>%
  filter(year==2007)

aid2012 = aid %>%
  filter(year==2012)

The causal graph is constructed based on our prior assumptions and common sense:

  1. The outcome(Y: GDP growth rate of the recipient) depends on the national demand, population growth rate, the foreign aid received, potentially its autocracy-democracy index(polity2), and the recipient’s openness (measured by its global trade).

  2. We assume that our treatment, US ODA, affect the outcome via direct effect(the recipient receives the aid) and indirect effect of ODA_GNI. The idea for GNI is that given the same amount of foreign aid, if the aid amounts to a large portion of the recipient’s gross national income, this aid should have a larger impact on the recipient. A similar idea applies to the foreign aid provided by the rest of the world.

  3. We assume the recipient country’s openness to global trading contributes to not only GDP growth but also the amount of aid it’s going to receive. The rationale behind the assumption is the potential economic welfare gains from an aid could strongly motivate the donor. For example, the donor might want to have an impact on, or take a share of, the recipient’s market. Finally, we think it’s reasonable to assume that national demand affects the recipient’s global trade.

  4. The assumption we made polity2 a confounder that has an impact on a wide range of other variables. The rationale behind this assumption is that the donors might also be motivated by enlarging their political impact on the recipient, such as winning over political support.

Causal Discovery

The causal graph was constructed based on our prior knowledge, which entails that the relationship assumed could be inaccurate. Therefore, we adopt causal discovery algorithm to generate causal graphs which help us identify causal flows.

The core idea of the algorithm is that if conditional on all possible sets of other variables, there remains a significant relationship between variable A and B, then there exists a direct causal association between them. The direction of the arrow is determined by controlling the middle variable for every three connected variables.

alpha is the threshold for the signifiance level. If

alpha = 0.01

causalAid = aid2012 %>%
  dplyr::select(-X,-country,-year)
suff_stat <- list(C = cor(causalAid), n = nrow(causalAid))
pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.01, skel.method = "stable.fast")

plot(pc_aid, main = "")
## Loading required namespace: Rgraphviz

With a threshold of 0.01, the outcome is entirely seperated from the rest of variables. Therefore, we believe 0.01 is too strict and thus relax it to 0.05.

alpha=0.05

pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.05, skel.method = "stable.fast")
plot(pc_aid, main = "")

alpha=0.12

pc_aid <- pc(suff_stat, indepTest = gaussCItest, labels = colnames(causalAid), alpha = 0.12, skel.method = "stable.fast")
plot(pc_aid, main = "")

Comparing the resulted causal graph with our assumptions, we spot the following causal flow that we didn’t identify before:

  1. it’s mainly ODA, instead of OOF, that contributes to other variables. For example, the graph shows that US_ODA has a direct effect on popolucation growth rate, which stimulates GDP growth. Rest_of_Work_ODA also contributes to population, which directly affects recipient’s world trade.
  2. US_ODA has a direct effect on Rest_of_Work_ODA.
  3. oda_gni directly affects recipient_trade_world.

As we relax the treshold from 0.01 to 0.12, there are a few changes we think might be of interests:

  1. when threshold is 0.01, recipient_trade_world contributes to population. When threshold increases, it’s polulation contributes to recipient_trade_world. Therefore, both directions worth exploring in our later sensitivity analysis.
  2. when threshold increases to 0.12, polity2 is considered to contribute to population growth.

It appears that the way variables affect the outcome is via population growth.

Model Construction informed by causal discovery

Informed by the causal discovery algorithm, we visualize the relationships that are suggested by the causal discovery algorithm in the below section.

We’d like to see

  1. Since the generated causal graph suggests US_ODA \(\rightarrow\) Rest_of_World_ODA and no arrow point from Rest_of_World_ODA to the outcome, which is counterintuitive. we’d like to see the relationship between US_ODA, Rest_of_World_ODA, and the growth rate outcome.

  2. We were surprised to see the causal graph doesn’t discern a significant causal flow from Rest_of_World_ODA to the outcome, so we want to visually check if Rest_of_World_ODA is indeed not a mediator, aka if Rest_of_World_ODA doesn’t directly contribute to the outcome.

  3. When we relax the alpha threshold, we observe that polity2 and US_ODA now both contribute to the outcome. This is consistent with our prior knowledge. Therefore, we want to check if we should introduce interaction term for polity2 and US_ODA. Interaction term means given the same US_ODA, if the recipient will have different growth because they have different polity2. In other words, if the recipient will react differently towards the same treatment.

US_ODA,Rest_of_World_ODA, and GDP_total_growth_rate

How GDP_total_growth_rate is affected by US_ODA and Rest_of_world_ODA?

ggplot(data = aid2012, aes(x = US_ODA, y = GDP_total_growth_rate)) +
    geom_point(aes(color = Rest_of_world_ODA), size = 2) +
    geom_smooth(se = FALSE, color = "blue")+
  coord_cartesian(xlim = c(0,1000))+
  scale_color_gradient(low = "yellow", high = "darkblue")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

From the plot, we can tell U.S. ODA and rest of world ODA are correlated, which verifies the generated causal graph. Looking at the y-axis of the plot, the ligher colored points mainly gathered at the bottom while darker ones take the upper part. We vaguely get a sense that both world ODA and U.S. ODA positively contribute to GDP total growth rate, but we admit that rest of world ODA’s contribution(pattern of the color) is unobvious.

Observing the effect of U.S. ODA, we can see that there seem to be an increasing trend between 0-250, and another decreasing trend for US_ODA that’s larger than 250. Instead of keep using continuous value for treatment, we decide to cut U.S ODA into two categories: [0,250] and [251,+).

aid2012 <- aid2012%>%
  mutate(US_ODA_C = ifelse(US_ODA<250, 0,1))
Does Rest_of_world_ODA_total affect the outcome?
ggplot(data = aid2012, aes(x = Rest_of_world_aid_total, y = GDP_total_growth_rate)) +
    geom_point() +
    geom_smooth(se = FALSE, color = "blue")+
    geom_smooth(formula = y~ns(x,2), method = "lm",
        se = FALSE, color = "red"
    )+
  coord_cartesian(xlim = c(0,2000))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Earlier, we didn’t see a causal flow from rest of world aid to the outcome, which is implausible from common sense. However, the seemingly positive slope of the bluel line suggests that greater Rest_of_world_ODA_total has slightly higher growth rate, although it’s clearly insignificant. However, we are still justified to believe rest of world aid might open a causal path: US_ODA \(\rightarrow\) rest of world aid\(\rightarrow\) GDP_total_growth_rate.

Does polity2 interact with US_ODA?
ggplot(data = aid2012, aes(x = US_ODA, y =polity2 )) +
    geom_point(aes(color = GDP_total_growth_rate), size = 2) +
  scale_color_gradient(low = "yellow", high = "darkblue")+
    geom_smooth(se = FALSE, color = "blue")+
  coord_cartesian(xlim = c(0,1000))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The colored scatter points don’t seem to form a clear pattern. Therefore we DON’t assume an interaction between polity2 and US_ODA.

Regression Model

Regression Results and Interpretation

We want to see if the variables selected should be linear or not; therefore, we would first investigate the association between the controlled variables and the outcome to decide if we want to introduce non-linearity in our regression. However, we would also build a linear model as comparison for us make a final choice between those two models.

Choice of Linearity for the controlled variables

1. Population growth rate vs. GDP growth rate
ggplot(data = aid2012, aes(x = population_growth_rate, y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded

According to the graph above, we might incorporate population growth rate as an non-linear variable in our model.

2. Trade balance of recipient countries vs. GDP growth rate
ggplot(data = aid2012, aes(x =log(recipient_trade_world), y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,2), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded

Since the trade balance of recipient countries is relative big number compared to the foreign aid, we want to use the natural log of the trade balance so that we are able to see numbers on the same scale; also, our analysis would not misled by the big numbers. In this graph, we can see that the natural log of trade balance of recipient countries is non-linear.

3. Politics score vs. GDP growth rate
ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded

Regression Models

Non-linear regression model
mod1 <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod1)
## 
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world), 
##     2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2012)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0556 -1.9284 -0.3254  1.6768 11.0134 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          0.3742     3.0684   0.122  0.90318    
## as.factor(US_ODA_C)1                -1.4542     0.9608  -1.514  0.13322    
## ns(log(recipient_trade_world), 2)1   7.0013     2.6501   2.642  0.00954 ** 
## ns(log(recipient_trade_world), 2)2   1.4048     1.3955   1.007  0.31646    
## ns(polity2, 3)1                     -0.5196     1.3877  -0.374  0.70886    
## ns(polity2, 3)2                     -5.4723     3.6145  -1.514  0.13312    
## ns(polity2, 3)3                      0.2544     1.0984   0.232  0.81732    
## ns(population_growth_rate, 3)1       7.6467     1.5332   4.987 2.52e-06 ***
## ns(population_growth_rate, 3)2       5.0143     4.7620   1.053  0.29483    
## ns(population_growth_rate, 3)3       1.0454     3.0372   0.344  0.73141    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.187 on 102 degrees of freedom
## Multiple R-squared:  0.2938, Adjusted R-squared:  0.2315 
## F-statistic: 4.715 on 9 and 102 DF,  p-value: 3.032e-05
confint(mod1)
##                                         2.5 %     97.5 %
## (Intercept)                         -5.711927  6.4602844
## as.factor(US_ODA_C)1                -3.359837  0.4514714
## ns(log(recipient_trade_world), 2)1   1.744870 12.2577906
## ns(log(recipient_trade_world), 2)2  -1.363057  4.1726871
## ns(polity2, 3)1                     -3.272090  2.2328938
## ns(polity2, 3)2                    -12.641748  1.6970599
## ns(polity2, 3)3                     -1.924319  2.4330696
## ns(population_growth_rate, 3)1       4.605608 10.6877901
## ns(population_growth_rate, 3)2      -4.431032 14.4595964
## ns(population_growth_rate, 3)3      -4.978882  7.0696293

Benefits of Countries which received U.S ODA < 250 million dollars: 0.3742 percent ; Benefits of Countries which received U.S ODA > 250 million dollars: 0.3742 -1.4542 = -1.08 percent

According to our model, if the U.S ODA is larger than 250 million dollars, then the causal effect of that donation is -1.4542 percent less GDP growth for the recipient countries than if the recipient were going to receive less donation in 2012, when we hold other variables as constant. Yet, the effect of U.S ODA is insignificant becuse the p-value for -1.4542 is 0.13322, which is way higher than conventional p-value thresholds 0.01 or 0.05. It means we are unable to reject the null hypothesis that large donation and small donation don’t make a difference in GDP growth of the recipient country.

Evaluation of regression model
results <- data.frame(resid = resid(mod1), fitted = fitted(mod1))
        ggplot(results, aes(x = fitted, y = resid)) + 
          geom_point() + 
          geom_hline(yintercept = 0)

In the residual plot, we can see that the residuals are assigned relatively evenly across the x-axis; also, the spread is also relatively random. Therefore, we can be confident that there is no heteroskedasticity in our model.

Besides the non-linear model, we also want to investigate the results from the linear model; and we would want to compare the residual plot and the adjusted R squared to see which model we would choose.

linear regression model
linear <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C)+ log(recipient_trade_world)+polity2+log(population_growth_rate),  data = aid2012)
## Warning in log(population_growth_rate): NaNs produced
summary(linear)
## 
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + log(recipient_trade_world) + 
##     polity2 + log(population_growth_rate), data = aid2012)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.4973 -2.0910 -0.2106  1.8263 11.0633 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  8.73638    2.57299   3.395 0.000984 ***
## as.factor(US_ODA_C)1        -1.44675    0.92882  -1.558 0.122483    
## log(recipient_trade_world)   0.44822    0.17169   2.611 0.010426 *  
## polity2                     -0.02564    0.06562  -0.391 0.696791    
## log(population_growth_rate)  1.87865    0.47218   3.979 0.000131 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.319 on 100 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.1881, Adjusted R-squared:  0.1556 
## F-statistic: 5.791 on 4 and 100 DF,  p-value: 0.0003111
Evaluation of regression model
results <- data.frame(resid = resid(linear), fitted = fitted(linear))
        ggplot(results, aes(x = fitted, y = resid)) + 
          geom_point() + 
          geom_hline(yintercept = 0)

We can see that the residual plots of the non-linear model is slightly better than that of the linear model because the residuals spread out more randomly. Also, the adjusted R squared is 0.2315 for the non-linear model which is higher than that of the linear model. Since the performance of the non-linear model is better, we would choose the non-linear model for our analysis and sensitivity analysis as well.

Sensivity Analysis

Another Treatment variable

Despite the fact that only U.S ODA can directly connect with the outcome according to our causal discovery, it does not necessarily suggest that other variables can not affect the outcome. Therefore, we want to investigate another potential treatment: total amount of U.S aid; and we want to see its impact on the outcome.

ggplot(data = aid2012, aes(x = US_aid_total, y = GDP_total_growth_rate))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

According to the graph, we might still want to cut the U.S aid total at around 250 million dollars because there is an increasing trend between 0 and 250 million dollars.

aid2012 <- aid2012%>%
  mutate(US_aid_total_C = ifelse(US_aid_total<250, 0,1))
1. Population growth rate vs. GDP growth rate
p1 <- ggplot(data = aid2012, aes(x = population_growth_rate, y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )

According to the graph above, we might incorporate population growth rate as an non-linear variable in our model.

2. Trade balance of recipient countries vs. GDP growth rate
p2 <- ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,1), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
3. Politics score vs. GDP growth rate
p3 <- ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate )) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
library("gridExtra")
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
grid.arrange(p1,p2,p3)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded

mod2 <- lm(GDP_total_growth_rate~ as.factor(US_aid_total_C) +log(recipient_trade_world)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod2)
## 
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_aid_total_C) + 
##     log(recipient_trade_world) + ns(polity2, 3) + ns(population_growth_rate, 
##     3), data = aid2012)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0977 -1.9686 -0.2409  1.5751 11.5972 
## 
## Coefficients:
##                                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      0.5556     3.2356   0.172   0.8640    
## as.factor(US_aid_total_C)1      -1.0671     0.8860  -1.204   0.2312    
## log(recipient_trade_world)       0.4010     0.1927   2.081   0.0399 *  
## ns(polity2, 3)1                 -0.6691     1.4018  -0.477   0.6341    
## ns(polity2, 3)2                 -6.5911     3.5954  -1.833   0.0697 .  
## ns(polity2, 3)3                  0.5208     1.1004   0.473   0.6370    
## ns(population_growth_rate, 3)1   7.9876     1.5447   5.171 1.15e-06 ***
## ns(population_growth_rate, 3)2   4.3293     4.7997   0.902   0.3692    
## ns(population_growth_rate, 3)3   0.9468     3.0302   0.312   0.7553    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.22 on 103 degrees of freedom
## Multiple R-squared:  0.2719, Adjusted R-squared:  0.2154 
## F-statistic: 4.809 on 8 and 103 DF,  p-value: 4.782e-05
confint(mod2)
##                                       2.5 %     97.5 %
## (Intercept)                     -5.86151803  6.9727103
## as.factor(US_aid_total_C)1      -2.82418225  0.6900471
## log(recipient_trade_world)       0.01885467  0.7832220
## ns(polity2, 3)1                 -3.44925595  2.1109830
## ns(polity2, 3)2                -13.72177772  0.5395028
## ns(polity2, 3)3                 -1.66154435  2.7031677
## ns(population_growth_rate, 3)1   4.92409059 11.0510462
## ns(population_growth_rate, 3)2  -5.18963437 13.8483210
## ns(population_growth_rate, 3)3  -5.06298759  6.9565041

As we switch to using the total amount of U.S aid, we can see that when the total aid is smaller than 250 million dollars, the GDP growth rate would be expected to increaseby 0.5556 percent holding all else constant in 2007; however, if the aid is greater than 250 million dollars, the GDP growth rate would be -1.0671 percentage less than receiving small donations, which means large donation is less ideal as receiving small donation in regard to GDP growth. Yet, the effect is insignificant(p-value is 0.2312), so we cannot reject the null effect hypothesis. The results are consistent with our findings that use U.S ODA as the treatment.

4. Unmeasured Variable

A common robust test is to see how an unmeasured confounder could affect the result. We recognize that the effect of foreign aid on recipient’s economic growth can go through complicated paths, in addition to the ones we showed above. However, due to the lack of data and the complexity of the topic, our causal graph couldn’t capture all the plausible causal relationships. Therefore, we particular need to assume the existence of an unmeasured confounder and see how it would affect our prior result.

Potential unmeasured variables could include recipient’s historical development, recipient’s national diplomacy, the natrual resources the land of the recipient possesses…etc.

First, we need to model the treatment US_ODA_C. To do so, we would first want to visualize the association between U.S aid and other controlled variables. The goal of visualization is to find out if the association is linear or not and if we should introduce the nonlinearity into our model. For all graphs below, blue lines are dervied from the actual data, providing insights on how a variable affects if one receives the treatment or not. The red lines are used to depict that relationship.

Since the absolute value of trade balance of recipient countries is a relative big number compared to the foreign aid, we use the natural log of the trade balance so that we are able to see numbers on the same scale, so our analysis would not be misled by the big numbers. In this graph, we can see that the natural log of trade balance of recipient countries is non-linear.

Similarly, putting population on log scale helps fitting the model.

4.1. US_ODA_C~log(recipient_trade_world)

ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = US_ODA_C)) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,2), method = "glm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

4.2. US_ODA_C~polity2

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

4.3. US_ODA_C~population

ggplot(data = aid2012, aes(x = log(population), y = US_ODA_C))+
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,2), method = "glm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In this graph, we can see that association between U.S ODA and politics score is non-linear; therefore, this variable should also be non-linear in our model

sensitivity_analysis <- function(.data, model_A, model_Y, assoc_A, assoc_Y) {
    n <- nrow(.data)

    # Obtain residuals with residuals()
    # Obtain residual variances with sigma()
    res_A <- residuals(model_A)
    res_var_A <- sigma(model_A)^2
    res_Y <- residuals(model_Y)
    res_var_Y <- sigma(model_Y)^2

    # Compute the mean and variance of U given A and Y
    mean_U_term1 <- (assoc_A/res_var_A)*res_A
    mean_U_term2 <- (((res_var_A - assoc_A^2)*assoc_Y)/(res_var_A*res_var_Y))*res_Y
    mean_U <- mean_U_term1 + mean_U_term2

    var_U_term1 <- (res_var_A - assoc_A^2)/(res_var_A*res_var_Y)
    var_U_term2 <- res_var_Y - assoc_Y^2 + ((assoc_A*assoc_Y)^2)/res_var_A
    var_U <- var_U_term1*var_U_term2

    # Simulate U and add it to the data
    U <- rnorm(n, mean = mean_U, sd = sqrt(var_U))
    .data$U <- U

    ########################################################################
    # The part below is the only part you need to change to implement
    # the sensitivity analysis in a new context.

    # Refit model to estimate the causal effect 
    updated_model <- lm(GDP_total_growth_rate~ US_ODA_C +log(recipient_trade_world)+polity2+log(population_growth_rate)+U, data = .data)
    # The names of the coefficients and confidence interval output rows
    # are called "A" for the treatment variable A.
    # This will change in a new dataset.
    list(c(
        estimate = unname(coefficients(updated_model)["US_ODA_C"]), 
        ci_95_lower = confint(updated_model)["US_ODA_C",1],
        ci_95_upper = confint(updated_model)["US_ODA_C",2]
    ))
}


# Begin the sensitivity analysis

# Fit required models for the sensitivity analysis
mod_A <- lm(US_ODA_C ~ ns(polity2,3)+ns(log(recipient_trade_world),2)+ns(log(population),2), data = aid2012)
mod_Y <- lm(GDP_total_growth_rate~ US_ODA_C +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3), data = aid2012)



summary(mod_A)
## 
## Call:
## lm(formula = US_ODA_C ~ ns(polity2, 3) + ns(log(recipient_trade_world), 
##     2) + ns(log(population), 2), data = aid2012)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.56039 -0.18527 -0.07231  0.04120  0.91509 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         -0.4016     0.2581  -1.556 0.122701    
## ns(polity2, 3)1                      0.0832     0.1462   0.569 0.570527    
## ns(polity2, 3)2                      0.4664     0.3786   1.232 0.220775    
## ns(polity2, 3)3                     -0.1759     0.1120  -1.570 0.119378    
## ns(log(recipient_trade_world), 2)1   0.3358     0.4278   0.785 0.434178    
## ns(log(recipient_trade_world), 2)2  -0.4180     0.2174  -1.923 0.057171 .  
## ns(log(population), 2)1              0.5985     0.5008   1.195 0.234696    
## ns(log(population), 2)2              1.0996     0.2937   3.744 0.000298 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3344 on 104 degrees of freedom
## Multiple R-squared:  0.2302, Adjusted R-squared:  0.1784 
## F-statistic: 4.442 on 7 and 104 DF,  p-value: 0.0002351
summary(mod_Y)
## 
## Call:
## lm(formula = GDP_total_growth_rate ~ US_ODA_C + ns(log(recipient_trade_world), 
##     2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2012)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0556 -1.9284 -0.3254  1.6768 11.0134 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                          0.3742     3.0684   0.122  0.90318    
## US_ODA_C                            -1.4542     0.9608  -1.514  0.13322    
## ns(log(recipient_trade_world), 2)1   7.0013     2.6501   2.642  0.00954 ** 
## ns(log(recipient_trade_world), 2)2   1.4048     1.3955   1.007  0.31646    
## ns(polity2, 3)1                     -0.5196     1.3877  -0.374  0.70886    
## ns(polity2, 3)2                     -5.4723     3.6145  -1.514  0.13312    
## ns(polity2, 3)3                      0.2544     1.0984   0.232  0.81732    
## ns(population_growth_rate, 3)1       7.6467     1.5332   4.987 2.52e-06 ***
## ns(population_growth_rate, 3)2       5.0143     4.7620   1.053  0.29483    
## ns(population_growth_rate, 3)3       1.0454     3.0372   0.344  0.73141    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.187 on 102 degrees of freedom
## Multiple R-squared:  0.2938, Adjusted R-squared:  0.2315 
## F-statistic: 4.715 on 9 and 102 DF,  p-value: 3.032e-05

Since the median residule for mod_A is -0.07231 and for mod_Y is -0.2106, we assume the unmeasure variable should have a positive impact on the outcome.

The black line denotes the ACE(US_ODA that’s larger than 250) under the specified U -> Y and U -> A assumption, and the grey in background indicates the range of ACE of the 95% CI. The blue line denotes the original ACE(without taking U into account).

  1. Assuming the intensity range of U -> Y and U -> A are reasonable, then regardless of the presence of U, the 95% confidence range of ACE that US_ODA has on GDP_total_growth_rate is below 0, which means there is a strong evidence that when US_ODAis larger than 250, the growth in the recipient’s contry is not as effective as US_ODA that’s smaller or equal to 250. In other words, the hypothetical U in this case is unlikely to qualitatively change our ACE.

This is counterintuitive, because usually we would assume more donation results in more obvious economic growth in the recipient’s country. A potential explanation we come up with is that poorer pre-aid condition of the recipient results in more donation, but the donation might not be used effectively due to the poor pre-aid condition. For example, according to prior knowledge, the contries receiving the most aid are Iraq, Jordan, Syria, where the political situation is not very stable. We think this might account for the counterintuitive “more donation –> less ideal” effect.

  1. Then what effect could U have? Within each U –> Y assumptioin, the various effect magnitude U –> A seems to result in more obvious negative effect when receiving large donation versus smaller donation.

  2. Within each U –> A, in general, the stronger the hypothetical effect of U –> Y, the less negative effect the large donation has on the GDP growth, but the trend is not obvious with small maginitude of U –>A. The confidence interval becomes entirely below 0 for strong U–>A, which means if the unmeasured variable affects the treatment a lot, then we are more confident in our evidence that larger donation results in less ideal GDP growth.

Therefore, our estimated ACE is quite robust even after considering a various causal flow that passes U, but insignificant.

Different Year (2007)

ggplot(data = aid2007, aes(x = US_ODA, y = GDP_total_growth_rate))+
  geom_point()+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

According to the graph, we might need to cut US_ODA at around 100 million dollars because there is an increasing trend before 100 million dollars.

aid2007 <- aid2007%>%
  mutate(US_ODA_C = ifelse(US_ODA<100, 0,1))
1. Population growth rate vs. GDP growth rate
p4<- ggplot(data = aid2007, aes(x = population_growth_rate, y = GDP_total_growth_rate)) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
2. Trade balance of recipient countries vs. GDP growth rate
p5<- ggplot(data = aid2012, aes(x = log(recipient_trade_world), y = GDP_total_growth_rate)) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,2), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
3. Politics score vs. GDP growth rate
p6<- ggplot(data = aid2012, aes(x = polity2, y = GDP_total_growth_rate)) +
  geom_point() +
  geom_smooth(se = FALSE, color = "blue")+ 
  geom_smooth(formula = y~ns(x,3), method = "lm",
    method.args = list(family="binomial"),
    se = FALSE, color = "red" )
library("gridExtra")
grid.arrange(p4,p5,p6)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: In lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
##     ...) :
##  extra argument 'family' will be disregarded

mod3 <- lm(GDP_total_growth_rate~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2007)
summary(mod3)
## 
## Call:
## lm(formula = GDP_total_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world), 
##     2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2007)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.6617  -1.7567  -0.2156   2.0653  17.5414 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                         8.49439    3.18600   2.666  0.00885 **
## as.factor(US_ODA_C)1                1.04743    0.89680   1.168  0.24539   
## ns(log(recipient_trade_world), 2)1  6.18904    2.90626   2.130  0.03548 * 
## ns(log(recipient_trade_world), 2)2  0.03302    1.62126   0.020  0.98379   
## ns(polity2, 3)1                    -2.03635    1.58750  -1.283  0.20233   
## ns(polity2, 3)2                    -7.93164    4.07153  -1.948  0.05400 . 
## ns(polity2, 3)3                    -0.19390    1.25375  -0.155  0.87738   
## ns(population_growth_rate, 3)1     -3.21362    1.58371  -2.029  0.04490 * 
## ns(population_growth_rate, 3)2     -2.71532    4.72539  -0.575  0.56674   
## ns(population_growth_rate, 3)3      1.90060    2.20009   0.864  0.38957   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.784 on 108 degrees of freedom
## Multiple R-squared:  0.1659, Adjusted R-squared:  0.09637 
## F-statistic: 2.386 on 9 and 108 DF,  p-value: 0.01667
confint(mod3)
##                                          2.5 %      97.5 %
## (Intercept)                          2.1791863 14.80958767
## as.factor(US_ODA_C)1                -0.7301829  2.82503512
## ns(log(recipient_trade_world), 2)1   0.4283341 11.94974683
## ns(log(recipient_trade_world), 2)2  -3.1805861  3.24663212
## ns(polity2, 3)1                     -5.1830531  1.11035211
## ns(polity2, 3)2                    -16.0021293  0.13884684
## ns(polity2, 3)3                     -2.6790615  2.29125322
## ns(population_growth_rate, 3)1      -6.3528151 -0.07443251
## ns(population_growth_rate, 3)2     -12.0818574  6.65121774
## ns(population_growth_rate, 3)3      -2.4603566  6.26155491

When we use the data from 2007, we can see that if the U.S ODA is smaller than 100 million dollars, the GDP growth rate is expected to increase by 8.49439 percent while we holding other variables as constant; while if the U.S ODA is greater than 100 million dollars, the GDP growth rate of recipients would be expected to increase by 9.54182 percent holding other variable constant. However, the benefits of U.S ODA (<100 million dollars) is significant, which is different from our previous results using data from 2012. Also, the additional benefit of larger U.S ODA is also not significant; therefore, we can not conclude that the more the U.S ODA, the greater economic impact it can generate.

It also makes sense that our results differ across different years because the marginal benefit of additional foreign aid is very likely to be different. Since the marginal benefit can differ, then the estimated contribution to the economics should also be different for each year.

Another Measurement of Economic growth

mod4 <- lm(GDP_per_growth_rate ~ as.factor(US_ODA_C) +ns(log(recipient_trade_world),2)+ns(polity2,3)+ns(population_growth_rate,3),data = aid2012)
summary(mod4)
## 
## Call:
## lm(formula = GDP_per_growth_rate ~ as.factor(US_ODA_C) + ns(log(recipient_trade_world), 
##     2) + ns(polity2, 3) + ns(population_growth_rate, 3), data = aid2012)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38535 -0.03160  0.00520  0.04228  0.18855 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                         0.03682    0.07467   0.493  0.62299   
## as.factor(US_ODA_C)1                0.06235    0.02338   2.667  0.00891 **
## ns(log(recipient_trade_world), 2)1  0.00185    0.06449   0.029  0.97717   
## ns(log(recipient_trade_world), 2)2 -0.01505    0.03396  -0.443  0.65855   
## ns(polity2, 3)1                    -0.04360    0.03377  -1.291  0.19960   
## ns(polity2, 3)2                    -0.13387    0.08796  -1.522  0.13112   
## ns(polity2, 3)3                    -0.01737    0.02673  -0.650  0.51735   
## ns(population_growth_rate, 3)1     -0.01401    0.03731  -0.375  0.70815   
## ns(population_growth_rate, 3)2      0.01591    0.11588   0.137  0.89110   
## ns(population_growth_rate, 3)3     -0.09342    0.07391  -1.264  0.20911   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.07755 on 102 degrees of freedom
## Multiple R-squared:  0.1028, Adjusted R-squared:  0.02366 
## F-statistic: 1.299 on 9 and 102 DF,  p-value: 0.2467
confint(mod4)
##                                          2.5 %     97.5 %
## (Intercept)                        -0.11128757 0.18492910
## as.factor(US_ODA_C)1                0.01597192 0.10872197
## ns(log(recipient_trade_world), 2)1 -0.12606828 0.12976876
## ns(log(recipient_trade_world), 2)2 -0.08240852 0.05230651
## ns(polity2, 3)1                    -0.11058279 0.02338367
## ns(polity2, 3)2                    -0.30834189 0.04059996
## ns(polity2, 3)3                    -0.07038620 0.03565296
## ns(population_growth_rate, 3)1     -0.08801248 0.06000037
## ns(population_growth_rate, 3)2     -0.21395049 0.24576214
## ns(population_growth_rate, 3)3     -0.24002735 0.05317901

We also switch to another common estimator of economic growth: GDP per capita growth rate, because a large GDP does not imply a large GDP per capita and vice versa; therefore, we want to use GDP per capita to see if our results would change substantially if we use another similar estimate of outcome. The coefficient of U.S ODA is still not significant and the magnitude remains small; but the only difference is that when U.S ODA is over 250 million dollars, there is insignificant benefit compared to a U.S ODA smaller than 250 million dollars. Overall, the benefits of U.S ODA are insignificant for the recipient countries, and this is also consistent with our findings using GDP growth rate.

Limitations

Limited Time line

In our model, we only used data in a particular year, which might not be persuasive or representative enough for us to make strong conclusion on the impact of U.S aid. For the future study, we might want to consider using a panel dataset and a time series model, such as the mixed effect model; then we might be able to generalize our conclusion on the impact of U.S aid.

Other important variables

We also know that there are various type of foreign aid, ODA and OOF. Also, within ODA and OOF, there are various type of aids as well. Some are goverment grants, and others might be loans. Since there are different forms of foreign aids and their characteristics might generate different impact on the economics of the recipient countries. Therefore, we might also want to examine the impact of various types of foreign aids, which can lead us to an alternative answer.

Causal Discovery and Regression

We used causal discovery to explore various forms of causal paths and graphs; however, we did not build our regression based on different conditional sets generated through causal discovery. Therefore, for future study, we can have different sets of conditional variables according to causal discovery and build regression models for each of those.

Reference

  1. Official development assistance (ODA) - Net ODA - OECD Data. (n.d.). Retrieved October 18, 2020, from https://data.oecd.org/oda/net-oda.htm